The file is a text table whose columns are tab-delimited. The third column contains gene expression scores across a panel of tissues in comma-separated value format. [ File format description ]
Write a Ruby program that:
Reads each line and, for that gene, computes the AVERAGE expression score for the first 10 tissues.
For each gene, output to a file:
The gene name and the average expression score for the first 10 tissues, as a tab-delimited table.
Exercise 2 - Subset results for Genes of Interest
Say we are only interested in the average expression score for a class of novel genes that encode for large proteins; the so called KIAA genes.
But the gene names in the affy expression data file are some UCSC-specific IDs.
Write a Ruby program that uses an alias file to filter your two-column results file from Exercise #1.
Only report average expression data for genes having an alias that looks like "KIAA" followed by one or more digits. Use a _regular expression_ to find such aliases.